Report on explolatory analysis and visualization.

After the wrangling phase carried out on the dataset and all necessary cleaning done, exploratory analysis was carried out and a couple of insights was generated from the study. it would be necessay to have those insights detailed in this report. The dog stages was analysed and found out that of the first predictions; the total of 1,477 were valid dog predictions, amongst them were 58 dogs in the doggo dog stage, the second predictions has a total of 1,495 were valid dog predictions, amongst them were 58 dogs in the doggo dog stage and for the third prediction the total of 1,446 were valid dog predictions, amongst them were 53 dogs in the doggo dog stage. It was also discovered that in the Dog stage category, tweets that contain floofer dog stage has the lowest occurence which stands at 8, followed by puppo with occurence of 23, pupper dog stage rank the highest in the dog stages which stands at 212 followed by doggo ranked second with occurence of 74. The majority of the tweets are sent from Twitter for iphone with an occurence of 1955, representing 98% of the category of source of tweet. Taking a look at the image predictions it suffice to say that from all the tweets, 1477 contain a valid prediction of Dog in the first prediction while that of second image prediction indicates that from all the tweets, 1495 contain a valid prediction of Dog in the second prediction. The third image prediction indicates that from all the tweets, 1446 contain a valid prediction of Dog in the third prediction.

The figure below highlights the correlation between the retweet count and favorite count, in this case a positive correction, it make sense as the more a tweet get retweeted the tendency to reach more audience for potential likes.

linegraph.png

The data point of the retweet count and favorite count categorized by prediction validity plotted on a scatter point to show the relationship between the three variables, the function below plot a scatterplot where the both retweet count (x-axis) and favorite count(y-axis) are set to a log scale to bettter show the relationship between the multiple variable. this figure is depicted below to show the relationship between the variables.

scatterplot.png

For the prediction with a doggo dog stage the mean confidence in first prediction is the highest, with a value 0f 60%, i.e. 60% confidence in the accuracy of the prediction. The Third prediction has the lowest confidence in the accuracy of prediction with about 6%. For the prediction with a floofer dog stage the mean confidence in first prediction is the highest, with a value 0f 58%, i.e. 58% confidence in the accuracy of the prediction. The Third prediction has the lowest confidence in the accuracy of prediction with about 6%. Lastly For the prediction with a pupper dog stage the mean confidence in first prediction is the highest, with a value 0f 61%, i.e. 61% confidence in the accuracy of the prediction. The Third prediction has the lowest confidence in the accuracy of prediction with about 6%.

In the image prediction dataframe there are predictions that were not dogs in all the three predictions, we can decide to assess the predictions that were dog only. we filter out the image prediction that predicted something else other than a dog, consequently a word cloud would be appropriate to display the predicted dog breed in word cloud. The breed of dogs from the first prediction is plotted on a word cloud to have a better view of all beeeds of dog predicted from all three predictions. from this figure below for the first prediction Labrador retriver and golden retriever has the highest number of predictions with every other dogs represented with their names.

wordcloud.png

The image predictions were spot on as a result of the sample gotten below, we can confirm this with the tweet that was archived in the twitter archive dataframe. For prediction in index number 9 we have the tweet as "This is Cassie. She is a college pup. Studying international doggo communication and stick theory for dog".

dog_sample.png

Conclusively to display the image predictions instead of the url, a IPython.display package was implemeted to display image in html, we can decide to assess the predictions that were dog only. the filtered valid dog dataframe was assessed in this scenario to display predicted images that are dog and not something else.